Goto

Collaborating Authors

 motion recognition


A Machine Learning-Based Multimodal Framework for Wearable Sensor-Based Archery Action Recognition and Stress Estimation

arXiv.org Artificial Intelligence

In precision sports such as archery, athletes' performance depends on both biomechanical stability and psychological resilience. Traditional motion analysis systems are often expensive and intrusive, limiting their use in natural training environments. To address this limitation, we propose a machine learning-based multimodal framework that integrates wearable sensor data for simultaneous action recognition and stress estimation. Using a self-developed wrist-worn device equipped with an accelerometer and photoplethysmography (PPG) sensor, we collected synchronized motion and physiological data during real archery sessions. For motion recognition, we introduce a novel feature--Smoothed Differential Acceleration (SmoothDiff)--and employ a Long Short-Term Memory (LSTM) model to identify motion phases, achieving 96.8% accuracy and 95.9% F1-score. For stress estimation, we extract heart rate variability (HRV) features from PPG signals and apply a Multi-Layer Perceptron (MLP) classifier, achieving 80% accuracy in distinguishing high- and low-stress levels. The proposed framework demonstrates that integrating motion and physiological sensing can provide meaningful insights into athletes' technical and mental states. This approach offers a foundation for developing intelligent, real-time feedback systems for training optimization in archery and other precision sports.


Accurate online action and gesture recognition system using detectors and Deep SPD Siamese Networks

arXiv.org Artificial Intelligence

Human activity recognition is an important research topic in pattern recognition field. It has been the subject of many studies in the past two decades because of its importance in numerous areas such as security, health, daily activity, energy consumption and robotics. Recently, some works on the recognition of hand gestures or human actions from skeletal data are based on the modeling of the skeleton's movement as manifold-based representation and proposed deep neural networks on this structure [1, 2, 3]. These approaches demonstrated their potential in the processing of skeletal data. Most of them are applied on offline human action recognition which is useful in time-limited tasks. However, in many applications, simply recognizing a single gesture in a given segmented sequence is not enough, especially in monitoring systems and virtual-reality devices which need to detect human movements moment by moment in continuous videos. In these online recognition systems, it is important to detect the existence of an action as early as possible after its beginning. It is also essential to determine the nature of the movement within a sequence of frames, without having information about the number of gestures present within the video, their starting times or their durations, unlike the segmented action recognition. In this paper, we propose to use a manifold-based model in order to build an online motion recognition system that detects and identifies different human activities in unsegmented skeletal sequences.


AutoMR: A Universal Time Series Motion Recognition Pipeline

arXiv.org Artificial Intelligence

In this paper, we present an end-to-end automated motion recognition (AutoMR) pipeline designed for multimodal datasets. The proposed framework seamlessly integrates data preprocessing, model training, hyperparameter tuning, and evaluation, enabling robust performance across diverse scenarios. Our approach addresses two primary challenges: 1) variability in sensor data formats and parameters across datasets, which traditionally requires task-specific machine learning implementations, and 2) the complexity and time consumption of hyperparameter tuning for optimal model performance. Our library features an all-in-one solution incorporating QuartzNet as the core model, automated hyperparameter tuning, and comprehensive metrics tracking. Extensive experiments demonstrate its effectiveness on 10 diverse datasets, achieving state-of-the-art performance. This work lays a solid foundation for deploying motion-capture solutions across varied real-world applications.


Task-Oriented Integrated Sensing, Computation and Communication for Wireless Edge AI

arXiv.org Artificial Intelligence

With the advent of emerging IoT applications such as autonomous driving, digital-twin and metaverse etc. featuring massive data sensing, analyzing and inference as well critical latency in beyond 5G (B5G) networks, edge artificial intelligence (AI) has been proposed to provide high-performance computation of a conventional cloud down to the network edge. However, most existing design frameworks separate these designs incurring unnecessary signaling overhead and waste of energy, and it is therefore of paramount importance to advance fully integrated sensing, computation and communication (ISCC) to achieve ultra-reliable and low-latency edge intelligence acquisition. In this article, we provide an overview of principles of enabling ISCC technologies followed by two concrete use cases of edge AI tasks demonstrating the advantage of task-oriented ISCC, and pointed out some practical challenges in edge AI design with advanced ISCC solutions. H. Xing and H. Wen are with the Internet of Things (IoT) Thrust, The Hong Kong University of Science and Technology (Guangzhou), Guangzhou 511453, China. H. Xing is also affiliated with the Department of Electronic and Computer Engineering, The Hong Kong University of Science and Technology, Hong Kong (e-mails: hongxing@ust.hk, D. Liu is with the School of Computing Science, University of Glasgow, Glasgow G12 8RZ, United Kingdom (e-mail: dongzhu.liu@glasgow.ac.uk). K. Huang is with the Department of Electrical and Electronic Engineering (EEE), The University of Hong Kong, Hong Kong (e-mail: huangkb@eee.hku.hk).


When it comes to neural networks learning motion, it's all relative

#artificialintelligence

Seeking to explore the capabilities of neural networks for recognizing and predicting motion, a group of researchers led by Hehe Fan developed and tested a deep learning approach based on relative change in position encoded as a series of vectors, finding that their method worked better than existing frameworks for modeling motion. The group's key innovation was to encode motion separately from position. The group's research was published in Intelligent Computing. The new method, VecNet LSTM, scored higher than six other artificial neural network frameworks within the field of video research when tested on recognition of motion. Some of the other frameworks were merely weaker, while others were totally unsuitable for modeling motion.


3DPalsyNet: A Facial Palsy Grading and Motion Recognition Framework using Fully 3D Convolutional Neural Networks

arXiv.org Artificial Intelligence

The capability to perform facial analysis from video sequences has significant potential to positively impact in many areas of life. One such area relates to the medical domain to specifically aid in the diagnosis and rehabilitation of patients with facial palsy. With this application in mind, this paper presents an end-to-end framework, named 3DPalsyNet, for the tasks of mouth motion recognition and facial palsy grading. 3DPalsyNet utilizes a 3D CNN architecture with a ResNet backbone for the prediction of these dynamic tasks. Leveraging transfer learning from a 3D CNNs pre-trained on the Kinetics data set for general action recognition, the model is modified to apply joint supervised learning using center and softmax loss concepts. 3DPalsyNet is evaluated on a test set consisting of individuals with varying ranges of facial palsy and mouth motions and the results have shown an attractive level of classification accuracy in these task of 82% and 86% respectively. The frame duration and the loss function affect was studied in terms of the predictive qualities of the proposed 3DPalsyNet, where it was found shorter frame duration's of 8 performed best for this specific task. Centre loss and softmax have shown improvements in spatio-temporal feature learning than softmax loss alone, this is in agreement with earlier work involving the spatial domain.


Use Cases

#artificialintelligence

PIX connects computers and devices with the physical world around them through the enablement of computer vision. The platform connects developers with a scalable, decentralized image database for use in disparate systems and platforms. PIX customers gain knowledge, service, and information to enable the developement of decentralized augmented reality apps. Recognize and react to playing cards, chips, game pieces, etc.